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AUTOMATED HISTOLOGICAL GRADING OF TUBULES 



This invention relates to a method, an apparatus and a computer program for grading 
tubules: it is particularly (although not exclusively) relevant to assessment of histological 
slides to provide clinical information on potentially cancerous tissue such as breast cancer 
5 tissue. 

Breast cancer is a common form of female cancer: Once a lesion indicative of breast 
cancer has been detected, tissue samples are taken and examined by a histopathologist 
to establish a diagnosis, prognosis and treatment plan. However, pathological analysis of 
tissue samples is a time consuming and inaccurate process. It entails interpretation of 

10 images by human eye, which is highly subjective: it is characterised by considerable 
inaccuracies in observations of the same samples by different observers and even by the 
same observer at different times. For example, two different observers assessing the 
same ten tissue samples may easily give different opinions for three of the slides * 30% 
error. The problem is exacerbated by heterogeneity, i.e. complexity of some tissue sample 

15 features. 

There is a need to provide an objective measurement of tubules grading to support a 
pathologist's diagnosis and patients' treatment. 

The present invention provides a method of grading tubules in a first image of a 
histological slide characterised in that it has the steps of: 

a) providing a second image of first objects in the first image which are sufficiently 
large and of appropriate pixel value characteristics at boundaries to potentially be 
tubules, 

b) providing a third image of second objects in the first image having pixel value 
characteristics of holes within tubules and fat, 

c) combining data from the second and third images to identify selected second 
objects which are within first objects, 

d) performing one or more of the following: 

i) counting first objects in the first image which may potentially be tubules to 
provide a parameter NOB, 

ii) counting the first objects having selected second objects within them and 
likely to be tubules to provide a parameter N, 

iii) determining the relative areas of selected second objects as proportions of 
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counting the number of first objects containing at .east medium sized hoies 
to provide a parameter T, and 
grading the first image's tubules on the basis of th« nna 
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characteristics of fat and holes within tubules, 

c) combining data from the second and third images to identify selected second 
objects which are within first objects, 

d) counting the first objects having selected second objects within them and likely to 
5 be tubules to provide a parameter N, 

e) determining the relative areas of selected second objects as proportions of 
respective first objects within which they are located to provide parameters RATIO, 

f) determining the total area of selected second objects as a proportion of total area 
of first objects within which they are located to provide a parameter SURF, 

1 0 g) determining a parameter PERCENT = N/NOB, 

h) counting the number of first objects containing at least medium sized holes to 
provide a parameter T, and 

i) grading the first image's tubules on the basis of the first image's parameters as 
aforesaid with reference to parameter threshold values. 



15 In this embodiment, the invention provides the advantage that it is an objective method 
which grades tubules from a variety of parameters yielding multiple grading from which a 
median can be derived. 



In another aspect, the present invention provides computer apparatus for grading tubules 
in a first image of a histological slide characterised in that it is programmed to: 
20 a) compute a second image of first objects in the first image which are sufficiently 

large and of appropriate pixel value characteristics at boundaries to potentially be 

tubules, 

b) compute a third image of second objects in the first image having pixel value 
characteristics of holes within tubules and fat, 
25 c) combine data from the second and third images to identify selected second objects 
which are within first objects, 
d) implement one or more of the following: 

i) counting first objects in the first image which may potentially be tubules to 
provide a parameter A/OS, 
30 ii) counting the first objects having selected second objects within them and 

likely to be tubules to provide a parameter N, 
iii) determining the relative areas of selected sepond objects as proportions of 
respective first objects within which they are located to provide parameters 
RATIO, 
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tubules, 

b) compute a third image of second objects in the first image having pixel value 
characteristics of fat and holes within tubules, 

c) combine data from the second and third images to identify selected second objects 
5 which are within first objects, 

d) . implement one or more of the following: 

i) counting first objects in the first image which may potentially be tubules to 
provide a parameter NOB, 

ii) counting the first objects having selected second objects within them and 
1 0 likely to be tubules to provide a parameter N, 

iii) determining the relative areas of selected second objects as proportions of 
respective first objects within which they are located to provide parameters 
RATIO, 

iv) determining the total area of selected second objects as a proportion of 
15 total area of first objects within which they are located to provide a 

parameter SURF, 

v) determining a parameter PERCENT = N/NOB, and 

vi) counting the number of first objects containing at least medium sized holes 
to provide a parameter T, and 

20 e) grade the first image's tubules on the basis of the one or more parameters as 
aforesaid with reference to parameter threshold values. 

In a preferred embodiment of this aspect, the present invention provides a computer 
program for grading tubules in a first image of a histological slide characterised in that it 
contains instructions for controlling computer apparatus to: 
25 a) provide a second image of first objects in the first image which are 

sufficiently large and of appropriate pixel value characteristics at 
boundaries to potentially be tubules and counting them to provide a 
parameter NOB, 

b) . provide a third image of second objects in the first image having pixel value 
30 characteristics of fat and holes within tubules, 

c) combine data from the second and third images to identify selected second 
objects which are within first objects, 

d) count the first objects having selected second objects within them and likely 
to be tubules to provide a parameter N, 

35 e) determine the relative areas of selected second objects as proportions of 



WO 2004/038633 PCT/GB2003/004527 

6 

respective first objects within which they are located to provide parameters 
RATIO, 

f) determine the total area of selected second objects as a proportion of total 
area of first objects within which they are located to provide a parameter 

5 SURF, 

g) determine a parameter PERCENT = N/NOB, 

h) count the number of first objects containing at least medium sized holes to 
provide a parameter T, and 

i) grade the first image's tubules on the basis of the first image's parameters 
10 as aforesaid with reference to parameter threshold values. 

The computer program and apparatus aspects of the invention may have preferred 
features corresponding to those of respective method aspects. 

In order that the invention might be more fully understood, embodiments thereof will now 
be described, by way of example only, with reference to the accompanying drawings, in 
15 which 

Figure 1 is a block diagram of a procedure of the invention for measuring tubule 
activity; 

Figure 2 is a block diagram showing part of the procedure of Figure 1 in more detail, 
and 

20 Figures 3 to 7 are simplified versions of images obtained during the Figure 2 procedure. 

Referring to Figure 1, there is illustrated a procedure 10 of the invention for assessment of 
tubule activity in tissue samples presented as histopathological slides of potential 
carcinomas of the breast. The procedure 10 requires data from histological slides in a 
suitable form. Sections are taken (cut) from breast tissue samples (biopsies) and placed 

25 on respective slides. Slides are stained using the staining agent Haemotoxylin & Eosin 
(H&E), which is the standard stain for delineating tissue and cellular structure. Tissue 
specimens stained with H&E are used to assess tubule activity. In the present example, 
image data were obtained by a pathologist using Zeiss Axioskop microscope with a 
Jenoptiks Progres 3012 digital camera. Image data from a slide is a set of digital images 

30 obtained at a linear magnification of 1 0 (i.e. 1 0X). 
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To select images, a pathologist scans the microscope over a slide, and at 10X 
magnification selects two regions (referred to as tiles) of the slide which appear to be most 
promising in terms of an analysis to be performed. Both these regions are then 
photographed using the microscope and digital camera referred to above. The digital 
5 camera produces for each region a respective digitised image in three colours, i.e. red, 
green and blue (R, G & B) eight bit values for each pixel and therefore in the range 0 to 
255: each image is an electronic equivalent of a tile. Three intensity values are obtained 
for each pixel in a pixel array to provide an image as a combination of R, G and B image 
planes. The image data from the two tiles are stored in a database 12 for later use. 
10 Tubule activity is determined using a tubule feature detection process 14: this provides a 
tubule score 18 for input to a diagnostic report at 20. 

The objective of the procedure 10 is to perform an extraction and a count of the tubules 
present in an image. A tubule is an image of a section through a mammary duct produced 
in the slide production process: it can appear round, oval, cylindrical or irregular 

1 5 depending on the angle of the section to the duct axis and the shape of the duct after 
sectioning. It appears as a white area surrounded by a dark epithelial layer (or boundary). 
A tubule score is 1 , 2 or 3 according to whether the condition is least, moderately or most 
serious respectively. Tubules may be less in number or absent in images scored 2 or 3 
because cancerous cells are invading them. The procedure 10 seeks to identify those 

20 white areas in an image that are surrounded by a dark epithelial layer: This should 
exclude fat, which also appears white but tends not to be surrounded by a darker 
epithelial layer. 

In a prior art manual procedure, a clinician places a slide under a microscope and 
examines a region of it at magnification of x10 for indications of tubule activity. The prior 
25 art manual procedure for scoring tubule activity involves a pathologist subjectively 
estimating the amount of tubules present in a tissue sample, taking care to ignore those 
• areas considered to be fat cells. The process described below in this example replaces 
the prior art manual procedure with an objective procedure. 
#. 

Referring now to Figure 2, the process 14 is shown in more detail: it is carried out for each 
30 of the two digitised images mentioned above, but will be described for one. At 30 an input 
colour image is used to calculate a greyscale equivalent: i.e. for each pixel the respective 
red, green and blue pixel values are averaged to produce a single greyscale pixel value. 
This is repeated for all pixels in the image to provide for further processing a greyscale 
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initial image 50 as shown in Figure 3 in simplified form. Figure 3 illustrates various kinds 
of initial image feature, tubules 51 with pronounced dark boundaries containing one or 
more holes such as 52, large dark objects 53, fat cells 55 with only very narrow faint 
boundaries and small dark objects 56 of no interest 

5 Step 32 is shown in more detail as a number.of constituent steps a) to e) within chain lines 
33. At a), the objective is to select from the greyscale image 50 only relatively darker 
pixels and omit relatively lighter pixels. An image is obtained from a) which contains 
relatively darker objects of varying size: the larger of these objects are likely to contain 
tubules, but others smaller in size are unlikely to do so. To implement step a), initially all 
10 pixels in the greyscale image 50 are compared with one another to obtain their maximum 
(Maxg) and minimum (Ming) values: these values are used to compute a parameter P 
given by: 

P = 12(Maxg - Ming)/100 (1) 

Each pixel in the greyscale image 50 is then divided by 255 so that it lies in the range 0 to 
15 1 . The value P is then used to transform the greyscale image 50 into an output image as 
shown in Table 1 below: 



Table 1 



Input image pixel value 


Output image pixel value 


< 1/255 


0 


> 1/255 AND < (Maxg - P)/255 


Mapped to the range [0,1] 


> (Maxg - P)/255 


1 



This means 1/255 becomes 0, (Maxg - P)/255 becomes 1 and x > 1/255 AND <> (Maxg - 
P)/255 becomes (x-1/255)/({Maxg-P}/255 - 1/255).The resulting output image values are 
20 now thresholded to produce a binary image: all output image pixel values less than a 
threshold value of 0.85 are set to zero, and all other pixel values are set to 1. The 
threshold of 0.85 was arrived at experimentally using trial images. 
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more detail a, g) and h) ^ ohal „ T" , Th ' 8 ' S Sh °™ 
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Input image value 


Output image value 


< (maximum -Q) /255 


0 


S: (maximum - Q)/255 AND ^ 1.0 


Mapped to the range [0,1] 
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where both ANDed pixels are 1, and is 0 otherwise. Also at h), the result of the AND 
operation is smoothed. 

Figure 6 shows a binary image 80 resulting from the AND operation between the images 
60 and 70 and subsequent smoothing at h). The image 80 has 1s only at locations 81 
5 (shown cross-hatched) corresponding (see image 50) to central holes such as 52 in 
potential tubules 51. This has the effect of removing objects which correspond in the 
original greyscales image 50 to isolated white islands (usually fat cells) which are not 
surrounded by a dark boundary. 

At 36 a respective connected component labelling (CCL) operation is applied to each of 
10 the images 60 and 80 resulting from e) and h): CCL is a known image processing 
technique (sometimes referred to as 'blob colouring') published by Klette R M Zamperoniu 
P., 'Handbook of Image Processing Operators', John Wiley & Sons, 1996, and Rosenfeld 
A., Kak A.C., 'Digital Picture Processing', vols. 1 & 2, Academic Press, New York, 1982. It 
gives different numerical labels to objects (blobs) in a binary image containing 0s and 1 s 
15 only, objects being groups of contiguous or connected pixels of 1: each object is assigned 
a number (label) different to others to enable it to be distinguished. CCL labels objects 
with numbers beginning with 1 , so the numbers of the highest numbered objects in the 
images 60 and 80 are respectively the number of objects which might potentially be 
tubules and the number of holes previously within dark areas. 

20 A tubule may contain one or more holes, and this is required to be detected to avoid an 
incorrect tubule count. At 38, each pixel in the h) image 80 is multiplied by the 
corresponding pixel in the same location in the CCL of image 60, which is a colour image 
when displayed on a colour monitor because CCL gives different colours to different 
objects. Figure 7 shows an image 90 resulting from multiplication at 38: the image 90 

25 does not have an image feature corresponding to the lower left hand blob in Figure 4, 
because it has been eliminated by multiplication by 0s in corresponding locations in the 
image 80. The image 90 retains features 91 , 92 and 93a to 93c corresponding to tubule 
holes such as 52 in the initial image 50. In this example images were processed using 
computer software referred to as "Matlab®" produced by Math works Inc., an American 

30 corporation. A Matlab function "ismember" is used to identify holes 91 and 92 associated 
with respective single tubules that have different labels, and holes 93a to 93c all 
associated with the same tubule that have the same label albeit different to those of holes 
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. Is dotted and like-labelled holes qq. 

all cross-hatched Tho n..™k«, ~* u ° ,es yda t0 93c are 

The following parameters are now derived: 

> "^umberofobjeoteoontalnlngoneormoreholesflnimage^, 

"OS = Number of objecte with and without holes (NOB = 4 in Image 60) 

SURF - tot <rt-holes area 

<«™-oojects tot* area = ratio of total surface area of holes to total surface 

area of objects including their holes 
PEKENT, MOB- rall0 o, number of obJects ^ hotes (o ^ numter o( 

objects with and without holes 

RATIO = ^° le area 

-Sdsrsarss ■ each object, the ratio of the area of its holefs) to Its area 

including ns hoie(s): RATIO is for obiects (tubules) with relaflvety 

Z*T sma " ,or *** * rela,ively °™» ^ ™ <° 

r = ii:r ^ s, ~ ,ar9e w - ™ng 

^esZ"es to — - - — IT .o 

P ndent, any one on its own or combination of two or more can be used to 
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provide a tubule score, but use of all five tests to provide a composite score produces 
better results. 

Testl : PERCENT is compared to 12 and 20. A high PERCENT >20 has been observed to 
characterise score 1 images, while a low PERCENT <12 characterises score 3 images; 
5 otherwise, i.e. if 12 < PERCENT < 20, an image score of 2 is indicated. 

Test2: 7* = Number of objects containing medium to large holes, i.e. for which 
RATIO> 0.09, is compared to threshold values of 2 and 5: T< 2, i.e. T= 1 or 0, indicates 
an image score of 3, whereas 7= 5 or greater indicates score 1; otherwise, i.e. if T= 2, 3 
or 4, the score is more likely to be 2. 

10 Test3: RATIO is large for objects (tubules) with relatively large holes and small for objects 
with relatively small holes compared to object size. RATIO is compared to threshold 
values of 0.07 and 0.03. When holes are large corresponding to image score 1, RATIO is 
likely to be higher than 0.07; when holes are small corresponding to image score 3, 
RATIO is likely to be below 0.03. Otherwise, i.e. if 0.03 < RATIO < 0.07, the indicated 

15 image score is 2. 

Test4: N = Number of objects containing one or more holes (in image 90, N = 3) 
corresponding to tubules. N is compared to 20 and 1 1 ; N > 20 indicates a score 1 image 
and N < 11 a score 3 image; otherwise i.e. if 11 < N< 20, the indicated image score is 2. ' 

Test5: SURF: If SURF\s greater than 0.001, the total area of holes is large indicating an 
20 image score of 1; if SURF is less than or equal to 0.0002, the total area of holes is small 
indicating an image score of 3; otherwise, i.e. if SURF is greater than 0.0002 but not 
greater than 0.001 , an image score of 2 is indicated. 

If PERCENT is greater than or equal to 600 the other tests are ignored and the image 
score is graded as 3. If PERCENT is less than 600, the final tubule score for an image is 
25 taken as the median value of five scores obtained respectively from the above five tests; 
When two images are used, two results for each test are obtained: the mean of these two 
results is computed and used with corresponding mean values from the other tests to 
derive a median value over the test results. 

The invention was tested on a set of 206 images acquired at low magnification XI0: the 
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data obtained (grouping olueteono ^ 8 Nervation of the imege 
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Score I 


31 
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Score 2 
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17 


30 


Score 3 


5 


6 


101 



images and compared with a 



pathologist's scores with the following results: 

- 72_3 % (1 49) images were graded by the classifier process of the invention 
w,th a score hi agreement with a pathologist's score; 

- 21 .8 % (45) images were graded by the classifier process of the invention with 
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a score differing by 1 from e pathologist's score: and 

6.8 % (12) , mag e s were grad ed by the classifier process of the invention with a 
score differing by 2 from a pathologist's score. 

The image dataae. contained more man three times as many score 3 images (138, 

images (24). Best res* were obtained tor images scored 3 by pathologists w«h an 
73.2 A, correct classification: this might be attributable to the high number of ZL 

which ,s at least 70% correct was achieved for images of all three scorea 1 2 and 3 Since 

as ,, s posstb le to do so me invention is verffied as regards a„ three scores. The average 
.me to compute resuKs for an image was estimated to be ,„ the range 20 to 40 seTds 

LI'ZTk °' ima " ** *" "*« - '~ can dearly be 

mplemented by an. appropnate conpufer progrom recorded on a carrier medium and 
™,n 8 on a conventional computer system. Such a program is 

,0 '~ '~. *— a number o^e 

processmg funchons are commercially available as indicated, and others are well known 

zrs. :r urss ' suoh a ~ -~ ~» - 



